Hybrid Program Analysis

ABSTRACT

A hybrid program analysis method includes initiating a static program analysis of an application, generating, by a static program analyzer, a query to a dynamic program analyzer upon determining a code construct of the application requiring dynamic analysis, resolving, by the dynamic program analyzer, the query into a set of arguments with which to invoke the code construct of the application, generating, by the dynamic program analyzer, the set of arguments, invoking, by the dynamic program analyzer, the code construct of the application using set of arguments, answering, by the dynamic program analyzer, the query, and continuing the static program analysis of the application.

BACKGROUND

This disclosure relates to program analysis, and more particularly, to ahybrid program analysis.

The process of program analysis may generally be divided into twogroups, static program analysis and dynamic program analysis. In staticprogram analysis, an analysis of computer software may be performedwithout executing the application being analyzed. In dynamic programanalysis, the application is executed on a real or virtual processorusing test inputs during an analysis.

Static program analysis is generally considered undecidable according toRice's theorem. Rice's theorem states that, for any non-trivial propertyof partial functions, there is no general and effective method todetermine whether an algorithm determines a partial function with thatproperty. Rice's theorem not only provides a theoretical upper bound,but also a limitation that's encountered by many analyses of practicalinterest.

Among these undecidable analyses are the problem of determining aprecise set of called methods for a given call site (also known aspointer analysis) is undecidable, the problem of resolving reflectivecalls is undecidable, and problems related to string analysis andconstant propagation.

Sound solutions for the above problems typically suffer from poorprecision. For example, the result of a call (in Java) toClass.newInstance can be approximated as all possible types in the classhierarchy of the subject application. However, the approximation of theresult yields an imprecise and un-scalable analysis.

An improved technique has been introduced to perform a two-stageanalysis, where a dynamic program analysis is first run to determinedynamic hints for an ensuing static analysis, which may then use thedynamic hints for modeling of challenging code constructs. For example,in the case of Class.newInstance, the dynamic analysis records the exacttypes of objects allocated by the newInstance call, and then the staticprogram analysis may use this data for pointer analysis to resolvevirtual calls. While it is generally understood that such reliance ondynamic program analysis is unsound, the problems targeted by thetwo-stage analysis are undecidable and sound approximate solutions areoften prohibitive in their loss of precision. That is, the two-stageanalysis is merely an improved compromise as compared to static programanalysis.

BRIEF SUMMARY

According to an embodiment of the present disclosure, a hybrid programanalysis method includes initiating a static program analysis of anapplication, generating, by a static program analyzer, a query to adynamic program analyzer upon determining a code construct of theapplication requiring dynamic analysis, passing control from the staticprogram analyzer to the dynamic program analyzer and initiating adynamic program analysis of the code construct, resolving, by thedynamic program analyzer, the query into a set of arguments with whichto invoke the code construct of the application, generating, by thedynamic program analyzer, the set of arguments, invoking, by the dynamicprogram analyzer, the code construct of the application using set ofarguments, answering, by the dynamic program analyzer, the query, andpassing control from the dynamic program analyzer to the static programanalyzer and continuing the static program analysis of the application.

According to an embodiment of the present disclosure, a computer programproduct for performing a hybrid program analysis comprises a computerreadable storage medium having computer readable program code embodiedtherewith, the computer readable program code comprising computerreadable program code configured to perform the hybrid program analysis.

According to an embodiment of the present disclosure, a hybrid programanalysis system comprises a memory device storing a plurality ofinstructions embodying the system and an application, and a processorconfigured to receive the application and execute the plurality ofinstructions to perform a method comprising initiating a static programanalysis of the application, generating, by a static program analyzer, aquery to a dynamic program analyzer upon determining a code construct ofthe application requiring dynamic analysis, resolving, by the dynamicprogram analyzer, the query into a set of arguments with which to invokethe code construct of the application, generating, by the dynamicprogram analyzer, the set of arguments, invoking, by the dynamic programanalyzer, the code construct of the application using the set ofarguments, returning, by the dynamic program analyzer, an answercorresponding to the query to the static program analyzer, andcontinuing the static program analysis of the application.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Preferred embodiments of the present disclosure will be described belowin more detail, with reference to the accompanying drawings:

FIG. 1 is a flow diagram of a routine for purposes of explaining anexemplary embodiment of the present disclosure;

FIG. 2 is flow diagram of an illustrative method for a hybrid method ofprogram analysis according to an embodiment of the present disclosure;

FIG. 3 is a block diagram depicting an exemplary computer system forperforming a method for hybrid method of program analysis according toan embodiment of the present disclosure;

FIG. 4 is a flow diagram of a routine for purposes of explaining anexemplary embodiment of the present disclosure; and

FIG. 5 is a block diagram depicting an exemplary computer system forperforming a method for hybrid method of program analysis according toan embodiment of the present disclosure.

DETAILED DESCRIPTION

According to an illustrative embodiment of the present disclosure, aframework is implemented for a hybrid method of program analysisincluding a static program analysis and a dynamic program analysis. Itshould be understood, however, that embodiments of the disclosure arenot limited to the particular methods and/or apparatus described herein.Rather, embodiments of the disclosure are more broadly related toenhanced techniques for performing program analysis. Furthermore,although reference may be made herein to specific software (e.g., Java),syntax, protocols, operating platforms (hardware or software), etc.,embodiments of the disclosure are not limited to such software, syntax,protocols, operating platforms, etc. Moreover, it will become apparentto those skilled in the art given the teachings herein that numerousmodifications can be made to the embodiments shown that are within thescope of the claimed invention. That is, no limitations with respect toembodiments shown and described herein are intended or should beinferred.

Reference will now be made to an exemplary routine 100 as shown in FIG.1 for purposes of describing an embodiment of the present disclosure.According to an embodiment of the present disclosure, the hybrid methodmay provide precise input arguments for use in the dynamic programanalysis for runs or executions of given computer readable instructions,with knowledge of which queries the static scanner will place. Havingcontrol over the input arguments, the response of the dynamic programanalysis is made specific for program runs that are appropriate foranswering the query posed by a static analyzer.

Consider the following example (in Java syntax):

Class c; if (*) // see block 101    i. c = Class1.class; else   ii. c =Class2.class; Object o = c.newInstance( ); // see block 102a-102bMethod.invoke(o, “foo”); // see block 103a-103b

In this example, the input arguments chosen by a dynamic analyzer mayall lead down a selected branch (104 or 105) of a conditional statement101. Then, when the static analyzer asks about the possible typesflowing into Object o, the answer by the dynamic program analysis isClass1 and Class2.

In view of the foregoing, and according to an embodiment of the presentdisclosure, the dynamic program analysis is specialized for a query athand. The specialization of the dynamic program analysis enables preciseinformation for the specific control flow corresponding to the query tobe obtained, as illustrated in the example above. For example, a dynamicprogram analysis, initiated at the point (e.g., a false branch) wherethe static program analysis queries an answer, enables a concise andprecise answer, as the dynamic analyzer is pointed toward a particularcontrol flow.

FIG. 2 is flow diagram of an illustrative hybrid method of programanalysis 200 according to an embodiment of the present disclosure. Moreparticularly, with reference to FIG. 2, a hybrid method of programanalysis 200 includes a static program analysis of application code at201. Upon reaching a code construct where a dynamic analyzer is requiredat 202, a static analyzer submits a query to the dynamic analyzer toretrieve relevant information at 203.

Referring to block 202, any piece of information that affects theprecision of the static analysis, but is not modeled in the abstractionmaintained by the analysis, can trigger a query. This includesreflective constructs, evaluation of conditional branches, externalcontent (e.g., coming from databases or files), etc.

The relevant information may include possible types allocated by anewInstance statement, etc. The request may be associated withcontextual information. The dynamic analyzer resolves the request, alongwith the contextual information, into one or more sets of arguments withwhich to invoke the subject application at 204. Examples of thesearguments include command-line arguments, or more generally, datainputs, which would lead execution down a desired code path.

Symbolic analysis techniques, such as a demand-driven symbolic analysisfor object-oriented programs and frameworks, may be used resolve theinput arguments. For example, the extraction of input arguments may betreated as a goal-reachability problem, wherein semantics of allstatements, including inter-procedural flow and exceptional conditions,are modeled. In an exemplary implementation, when the analysis finds aprecondition P for postcondition R, the analysis guarantees that anystate which satisfies P must necessarily drive program execution to R.No other exceptions will be thrown before reaching R.

In a further exemplary implementation, the goal-reachability problem isbased on a backward symbolic analysis. In principle, such an analysiscomputes weakest preconditions (described herein) over each control-flowpath, going backwards from the goal statement to an input argument. Ifthe computed precondition P for any path r is satisfiable, then asatisfying assignment for P gives the input arguments that would forceexecution along r to the goal.

The application is then invoked and the query answered by the dynamicanalyzer at 205. If, for example, the query is for a possible resolutionof a reflective allocation in the routine above, then the answer wouldbe Class1. If the query concerns the evaluation of a conditional branch,then the answer would be true or false. In view of the foregoing, theanswer may take various forms. The exemplary answers described hereinare not intended to be limiting.

The hybrid method 200 continues if additional application code isavailable at 206 with the static analyzer at 201. The hybrid method 200takes application code as input and outputs application properties (seeFIG. 3). The application properties may reveal application behaviors,and may include application metrics (e.g., objective, reproducible andquantifiable measurements of application behavior).

In view of the foregoing, a dynamic program analysis may be specializedfor a query at hand.

In view of the foregoing, and referring to FIG. 3, at least a portion ofan exemplary hybrid program analyzer 301 according to an embodiment ofthe disclosure includes a static program analyzer 302 and a dynamicprogram analyzer 303. The static program analyzer 302, in anillustrative embodiment, passes control of the application analysis to adynamic program analyzer 303 at 202. Similarly, the dynamic programanalyzer 303, in an illustrative embodiment, passes control of theapplication analysis back to the status program analyzer 302 at 206.

In view of FIG. 3, it should be noted that any of the methods describedherein can include an additional step of providing a system comprisingdistinct software modules embodied on one or more tangible computerreadable storage media. All the modules (or any subset thereof) can beon the same medium, or each can be on a different medium, for example.The modules can include any or all of the components shown in thefigures. In a non-limiting example, the modules include a first module,e.g., 302, which scans application code using a static program analysis,a second module, e.g., 303, which receives a query from the first modulecorresponding to a specific branch of the application, wherein thesecond module performs a dynamic program analysis in response to thequery, and a third module, e.g., hybrid program analyzer, e.g., 301,that outputs properties of the application in accordance with the staticprogram analysis and the dynamic program analysis. The method steps canthen be carried out using the distinct software modules of the system,as described above, executing on one or more hardware processors.

According to an embodiment of the present disclosure, and referring tothe contextual information, an inherent aspect of static programanalysis is data abstraction, which enables finite yet sound explorationof the state space of the application. For example, a common abstractionin security analysis is to use access paths to denote untrusted heapregions. The contextual information provided by the static programanalysis includes the abstract state at the point where the query isissued. This may be illustrated via the following example ofweb-application security analysis 400 depicted in FIG. 4:

String username; if (request.hasParameter(“name”)) { // see block 401  iii. username = request.getParameter(“name”);   iv. username =removeIllegalChars(username); // see branch 404 } else   v. username =“<N/A>”; // see branch 405 String data = transform(username); // seeblock 402 response.getWriter( ).println(data); // see block 403

In this example, the getParameter call is a security source, which reads(untrusted) user-provided data. Further, the println call is a securitysink that renders the data to the response HTML.

A possible query by the static program analysis is whether the datareaching the sink, that is the println call at block 403, containscertain characters (e.g., illegal characters ‘<’ and ‘>’), in which casethe above code is determined to be vulnerable.

A possible abstract state at the sink call is {username.*, data.*},which denotes that the values pointed-to by username and data areuntrusted due to the source call. With this context in place, thedynamic program analysis may synthesize test payloads that pass throughthe true branch of the conditional statement before arriving at the sinkcall.

According to an embodiment of the present disclosure, and referring tothe translation into test input arguments, given the contextualinformation by the static analysis, which constrains—or focuses—thedynamic program analysis in its choice of which execution paths to visitbased on data abstraction, the dynamic program analysis may map theseconstraints into input arguments to the application (see also block 204in FIG. 2).

Well-known techniques may be used to map constraints into inputarguments to the application, such as the weakest-precondition approachfor test generation. The weakest-precondition approach attempts to finda solution for the constraint system induced by the contextualinformation provided by the static analysis, and the path constraintsinduced by backward traversal, that is outputs to input arguments, ofthe execution path connecting the program's entry location to thequeried location. More formally, according to an exemplaryimplementation of the weakest-precondition approach, given a statementS, the weakest-precondition of S is a function mapping any postconditionR to a precondition. The result of this function, denoted wp (S,R), isthe “weakest” precondition on the initial state or input argumentensuring that execution of S terminates in a final state satisfying R.

The weakest precondition semantics may be used to provide the greatestset of possible input arguments leading to the given output observation.

It should be understood that the term application as used herein mayrefer to individual statements and declarations in computer readablecode, individual objects, complete source code of an application, etc.Similarly, embodiments described herein are not limited to source codeand may be applied to object code. In summary, embodiments of thepresent disclosure are not limited to the analysis of certain levels ortypes of code and may be implemented in any case where program analysisis applicable.

The methodologies of embodiments of the disclosure may be particularlywell-suited for use in an electronic device or alternative system.Accordingly, embodiments of the present disclosure may take the form ofan entirely hardware embodiment or an embodiment combining software andhardware aspects that may all generally be referred to herein as a“processor”, “circuit,” “module” or “system.” Furthermore, embodimentsof the present disclosure may take the form of a computer programproduct embodied in one or more computer readable medium(s) havingcomputer readable program code stored thereon.

Any combination of one or more computer usable or computer readablemedium(s) may be utilized. The computer-usable or computer-readablemedium may be a computer readable storage medium. A computer readablestorage medium may be, for example but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer-readablestorage medium would include the following: a portable computerdiskette, a hard disk, a random access memory (RAM), a read-only memory(ROM), an erasable programmable read-only memory (EPROM or Flashmemory), an optical fiber, a portable compact disc read-only memory(CD-ROM), an optical storage device, a magnetic storage device, or anysuitable combination of the foregoing. In the context of this document,a computer readable storage medium may be any tangible medium that cancontain or store a program for use by or in connection with aninstruction execution system, apparatus or device.

Computer program code for carrying out operations of embodiments of thepresent disclosure may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Embodiments of the present disclosure are described above with referenceto flowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products. It will be understood that eachblock of the flowchart illustrations and/or block diagrams, andcombinations of blocks in the flowchart illustrations and/or blockdiagrams, can be implemented by computer program instructions.

These computer program instructions may be stored in a computer-readablemedium that can direct a computer or other programmable data processingapparatus to function in a particular manner, such that the instructionsstored in the computer-readable medium produce an article of manufactureincluding instruction means which implement the function/act specifiedin the flowchart and/or block diagram block or blocks.

The computer program instructions may be stored in a computer readablemedium that can direct a computer, other programmable data processingapparatus, or other devices to function in a particular manner, suchthat the instructions stored in the computer readable medium produce anarticle of manufacture including instructions which implement thefunction/act specified in the flowchart and/or block diagram block orblocks.

For example, FIG. 5 is a block diagram depicting an exemplary computersystem for performing a hybrid method of program analysis according toan embodiment of the present disclosure. The computer system shown inFIG. 5 includes a processor 501, memory 502, signal source 503, systembus 504, Hard Drive (HD) controller 505, keyboard controller 506, serialinterface controller 507, parallel interface controller 508, displaycontroller 509, hard disk 510, keyboard 511, serial peripheral device512, parallel peripheral device 513, and display 514.

In these components, the processor 501, memory 502, signal source 503,HD controller 505, keyboard controller 506, serial interface controller507, parallel interface controller 508, display controller 509 areconnected to the system bus 504. The hard disk 510 is connected to theHD controller 505. The keyboard 511 is connected to the keyboardcontroller 506. The serial peripheral device 512 is connected to theserial interface controller 507. The parallel peripheral device 513 isconnected to the parallel interface controller 508. The display 514 isconnected to the display controller 509.

In different applications, some of the components shown in FIG. 5 can beomitted. The whole system shown in FIG. 5 is controlled by computerreadable instructions, which are generally stored in the hard disk 510,EPROM or other non-volatile storage such as software. The software canbe downloaded from a network (not shown in the figures), stored in thehard disk 510. Alternatively, a software downloaded from a network canbe loaded into the memory 502 and executed by the processor 501 so as tocomplete the function determined by the software.

The processor 501 may be configured to perform one or more methodologiesdescribed in the present disclosure, illustrative embodiments of whichare shown in the above figures and described herein. Embodiments of thepresent disclosure can be implemented as a routine that is stored inmemory 502 and executed by the processor 501 to process the signal fromthe signal source 503. As such, the computer system is a general-purposecomputer system that becomes a specific purpose computer system whenexecuting the routine of the present disclosure.

Although the computer system described in FIG. 5 can support methodsaccording to the present disclosure, this system is only one example ofa computer system. Those skilled of the art should understand that othercomputer system designs can be used to implement the present invention.

It is to be appreciated that the term “processor” as used herein isintended to include any processing device, such as, for example, onethat includes a central processing unit (CPU) and/or other processingcircuitry (e.g., digital signal processor (DSP), microprocessor, etc.).Additionally, it is to be understood that the term “processor” may referto a multi-core processor that contains multiple processing cores in aprocessor or more than one processing device, and that various elementsassociated with a processing device may be shared by other processingdevices.

The term “memory” as used herein is intended to include memory and othercomputer-readable media associated with a processor or CPU, such as, forexample, random access memory (RAM), read only memory (ROM), fixedstorage media (e.g., a hard drive), removable storage media (e.g., adiskette), flash memory, etc. Furthermore, the term “I/O circuitry” asused herein is intended to include, for example, one or more inputdevices (e.g., keyboard, mouse, etc.) for entering data to theprocessor, and/or one or more output devices (e.g., printer, monitor,etc.) for presenting the results associated with the processor.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present disclosure. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

Although illustrative embodiments of the present disclosure have beendescribed herein with reference to the accompanying drawings, it is tobe understood that the disclosure is not limited to those preciseembodiments, and that various other changes and modifications may bemade therein by one skilled in the art without departing from the scopeof the appended claims.

What is claimed is:
 1. A hybrid program analysis method comprising:initiating a static program analysis of an application; generating, by astatic program analyzer, a query to a dynamic program analyzer upondetermining a code construct of the application requiring dynamicanalysis; passing control from the static program analyzer to thedynamic program analyzer and initiating a dynamic program analysis ofthe code construct; resolving, by the dynamic program analyzer, thequery into a set of arguments with which to invoke the code construct ofthe application; generating, by the dynamic program analyzer, the set ofarguments; invoking, by the dynamic program analyzer, the code constructof the application using the set of arguments; answering, by the dynamicprogram analyzer, the query; and passing control from the dynamicprogram analyzer to the static program analyzer and continuing thestatic program analysis of the application.
 2. The hybrid method ofclaim 1, wherein the query includes contextual information for the codeconstruct.
 3. The hybrid method of claim 1, wherein the code constructof the application requiring dynamic analysis is identified as affectinga precision of the static analysis, and further wherein the codeconstruct is not modeled in an abstraction maintained by the staticanalysis.
 4. The hybrid method of claim 1, wherein the code construct ofthe application requiring dynamic analysis is identified by the staticanalysis as a reflective construct.
 5. The hybrid method of claim 1,wherein the code construct of the application requiring dynamic analysisis identified by the static analysis as an evaluation of a conditionalbranch in the application.
 6. The hybrid method of claim 1, wherein thecode construct of the application requiring dynamic analysis isidentified by the static analysis as external content.
 7. The hybridmethod of claim 1, further comprising resolving, by the dynamic programanalyzer, the query with the contextual information into the set ofarguments.
 8. The hybrid method of claim 1, wherein the set of argumentsincludes a command-line argument corresponding to an identified branchof the application.
 9. The hybrid method of claim 1, wherein the set ofarguments includes a data input corresponding to an identified branch ofthe application.